{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "c94240f5",
   "metadata": {},
   "source": [
    "# GraphSparqlQAChain\n",
    "\n",
    "Graph databases are an excellent choice for applications based on network-like models. To standardize the syntax and semantics of such graphs, the W3C recommends Semantic Web Technologies, cp. [Semantic Web](https://www.w3.org/standards/semanticweb/). [SPARQL](https://www.w3.org/TR/sparql11-query/) serves as a query language analogously to SQL or Cypher for these graphs. This notebook demonstrates the application of LLMs as a natural language interface to a graph database by generating SPARQL.\\\n",
    "Disclaimer: To date, SPARQL query generation via LLMs is still a bit unstable. Be especially careful with UPDATE queries, which alter the graph."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "dbc0ee68",
   "metadata": {},
   "source": [
    "There are several sources you can run queries against, including files on the web, files you have available locally, SPARQL endpoints, e.g., [Wikidata](https://www.wikidata.org/wiki/Wikidata:Main_Page), and [triple stores](https://www.w3.org/wiki/LargeTripleStores)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "62812aad",
   "metadata": {
    "pycharm": {
     "is_executing": true
    }
   },
   "outputs": [],
   "source": [
    "from langchain.chains import GraphSparqlQAChain\n",
    "from langchain_community.graphs import RdfGraph\n",
    "from langchain_openai import ChatOpenAI"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "0928915d",
   "metadata": {
    "pycharm": {
     "is_executing": true
    }
   },
   "outputs": [],
   "source": [
    "graph = RdfGraph(\n",
    "    source_file=\"http://www.w3.org/People/Berners-Lee/card\",\n",
    "    standard=\"rdf\",\n",
    "    local_copy=\"test.ttl\",\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7af596b5",
   "metadata": {
    "collapsed": false
   },
   "source": [
    "Note that providing a `local_file` is necessary for storing changes locally if the source is read-only."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "58c1a8ea",
   "metadata": {},
   "source": [
    "## Refresh graph schema information\n",
    "If the schema of the database changes, you can refresh the schema information needed to generate SPARQL queries."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "4e3de44f",
   "metadata": {
    "pycharm": {
     "is_executing": true
    }
   },
   "outputs": [],
   "source": [
    "graph.load_schema()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "1fe76ccd",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "In the following, each IRI is followed by the local name and optionally its description in parentheses. \n",
      "The RDF graph supports the following node types:\n",
      "<http://xmlns.com/foaf/0.1/PersonalProfileDocument> (PersonalProfileDocument, None), <http://www.w3.org/ns/auth/cert#RSAPublicKey> (RSAPublicKey, None), <http://www.w3.org/2000/10/swap/pim/contact#Male> (Male, None), <http://xmlns.com/foaf/0.1/Person> (Person, None), <http://www.w3.org/2006/vcard/ns#Work> (Work, None)\n",
      "The RDF graph supports the following relationships:\n",
      "<http://www.w3.org/2000/01/rdf-schema#seeAlso> (seeAlso, None), <http://purl.org/dc/elements/1.1/title> (title, None), <http://xmlns.com/foaf/0.1/mbox_sha1sum> (mbox_sha1sum, None), <http://xmlns.com/foaf/0.1/maker> (maker, None), <http://www.w3.org/ns/solid/terms#oidcIssuer> (oidcIssuer, None), <http://www.w3.org/2000/10/swap/pim/contact#publicHomePage> (publicHomePage, None), <http://xmlns.com/foaf/0.1/openid> (openid, None), <http://www.w3.org/ns/pim/space#storage> (storage, None), <http://xmlns.com/foaf/0.1/name> (name, None), <http://www.w3.org/2000/10/swap/pim/contact#country> (country, None), <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> (type, None), <http://www.w3.org/ns/solid/terms#profileHighlightColor> (profileHighlightColor, None), <http://www.w3.org/ns/pim/space#preferencesFile> (preferencesFile, None), <http://www.w3.org/2000/01/rdf-schema#label> (label, None), <http://www.w3.org/ns/auth/cert#modulus> (modulus, None), <http://www.w3.org/2000/10/swap/pim/contact#participant> (participant, None), <http://www.w3.org/2000/10/swap/pim/contact#street2> (street2, None), <http://www.w3.org/2006/vcard/ns#locality> (locality, None), <http://xmlns.com/foaf/0.1/nick> (nick, None), <http://xmlns.com/foaf/0.1/homepage> (homepage, None), <http://creativecommons.org/ns#license> (license, None), <http://xmlns.com/foaf/0.1/givenname> (givenname, None), <http://www.w3.org/2006/vcard/ns#street-address> (street-address, None), <http://www.w3.org/2006/vcard/ns#postal-code> (postal-code, None), <http://www.w3.org/2000/10/swap/pim/contact#street> (street, None), <http://www.w3.org/2003/01/geo/wgs84_pos#lat> (lat, None), <http://xmlns.com/foaf/0.1/primaryTopic> (primaryTopic, None), <http://www.w3.org/2006/vcard/ns#fn> (fn, None), <http://www.w3.org/2003/01/geo/wgs84_pos#location> (location, None), <http://usefulinc.com/ns/doap#developer> (developer, None), <http://www.w3.org/2000/10/swap/pim/contact#city> (city, None), <http://www.w3.org/2006/vcard/ns#region> (region, None), <http://xmlns.com/foaf/0.1/member> (member, None), <http://www.w3.org/2003/01/geo/wgs84_pos#long> (long, None), <http://www.w3.org/2000/10/swap/pim/contact#address> (address, None), <http://xmlns.com/foaf/0.1/family_name> (family_name, None), <http://xmlns.com/foaf/0.1/account> (account, None), <http://xmlns.com/foaf/0.1/workplaceHomepage> (workplaceHomepage, None), <http://purl.org/dc/terms/title> (title, None), <http://www.w3.org/ns/solid/terms#publicTypeIndex> (publicTypeIndex, None), <http://www.w3.org/2000/10/swap/pim/contact#office> (office, None), <http://www.w3.org/2000/10/swap/pim/contact#homePage> (homePage, None), <http://xmlns.com/foaf/0.1/mbox> (mbox, None), <http://www.w3.org/2000/10/swap/pim/contact#preferredURI> (preferredURI, None), <http://www.w3.org/ns/solid/terms#profileBackgroundColor> (profileBackgroundColor, None), <http://schema.org/owns> (owns, None), <http://xmlns.com/foaf/0.1/based_near> (based_near, None), <http://www.w3.org/2006/vcard/ns#hasAddress> (hasAddress, None), <http://xmlns.com/foaf/0.1/img> (img, None), <http://www.w3.org/2000/10/swap/pim/contact#assistant> (assistant, None), <http://xmlns.com/foaf/0.1/title> (title, None), <http://www.w3.org/ns/auth/cert#key> (key, None), <http://www.w3.org/ns/ldp#inbox> (inbox, None), <http://www.w3.org/ns/solid/terms#editableProfile> (editableProfile, None), <http://www.w3.org/2000/10/swap/pim/contact#postalCode> (postalCode, None), <http://xmlns.com/foaf/0.1/weblog> (weblog, None), <http://www.w3.org/ns/auth/cert#exponent> (exponent, None), <http://rdfs.org/sioc/ns#avatar> (avatar, None)\n",
      "\n"
     ]
    }
   ],
   "source": [
    "graph.get_schema"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "68a3c677",
   "metadata": {},
   "source": [
    "## Querying the graph\n",
    "\n",
    "Now, you can use the graph SPARQL QA chain to ask questions about the graph."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "7476ce98",
   "metadata": {
    "pycharm": {
     "is_executing": true
    }
   },
   "outputs": [],
   "source": [
    "chain = GraphSparqlQAChain.from_llm(\n",
    "    ChatOpenAI(temperature=0), graph=graph, verbose=True\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "id": "ef8ee27b",
   "metadata": {
    "pycharm": {
     "is_executing": true
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "\n",
      "\u001b[1m> Entering new GraphSparqlQAChain chain...\u001b[0m\n",
      "Identified intent:\n",
      "\u001b[32;1m\u001b[1;3mSELECT\u001b[0m\n",
      "Generated SPARQL:\n",
      "\u001b[32;1m\u001b[1;3mPREFIX foaf: <http://xmlns.com/foaf/0.1/>\n",
      "SELECT ?homepage\n",
      "WHERE {\n",
      "    ?person foaf:name \"Tim Berners-Lee\" .\n",
      "    ?person foaf:workplaceHomepage ?homepage .\n",
      "}\u001b[0m\n",
      "Full Context:\n",
      "\u001b[32;1m\u001b[1;3m[]\u001b[0m\n",
      "\n",
      "\u001b[1m> Finished chain.\u001b[0m\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "\"Tim Berners-Lee's work homepage is http://www.w3.org/People/Berners-Lee/.\""
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "chain.run(\"What is Tim Berners-Lee's work homepage?\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "af4b3294",
   "metadata": {},
   "source": [
    "## Updating the graph\n",
    "\n",
    "Analogously, you can update the graph, i.e., insert triples, using natural language."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "id": "fdf38841",
   "metadata": {
    "pycharm": {
     "is_executing": true
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "\n",
      "\u001b[1m> Entering new GraphSparqlQAChain chain...\u001b[0m\n",
      "Identified intent:\n",
      "\u001b[32;1m\u001b[1;3mUPDATE\u001b[0m\n",
      "Generated SPARQL:\n",
      "\u001b[32;1m\u001b[1;3mPREFIX foaf: <http://xmlns.com/foaf/0.1/>\n",
      "INSERT {\n",
      "    ?person foaf:workplaceHomepage <http://www.w3.org/foo/bar/> .\n",
      "}\n",
      "WHERE {\n",
      "    ?person foaf:name \"Timothy Berners-Lee\" .\n",
      "}\u001b[0m\n",
      "\n",
      "\u001b[1m> Finished chain.\u001b[0m\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "'Successfully inserted triples into the graph.'"
      ]
     },
     "execution_count": 14,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "chain.run(\n",
    "    \"Save that the person with the name 'Timothy Berners-Lee' has a work homepage at 'http://www.w3.org/foo/bar/'\"\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5e0f7fc1",
   "metadata": {},
   "source": [
    "Let's verify the results:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "id": "f874171b",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[(rdflib.term.URIRef('https://www.w3.org/'),),\n",
       " (rdflib.term.URIRef('http://www.w3.org/foo/bar/'),)]"
      ]
     },
     "execution_count": 15,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "query = (\n",
    "    \"\"\"PREFIX foaf: <http://xmlns.com/foaf/0.1/>\\n\"\"\"\n",
    "    \"\"\"SELECT ?hp\\n\"\"\"\n",
    "    \"\"\"WHERE {\\n\"\"\"\n",
    "    \"\"\"    ?person foaf:name \"Timothy Berners-Lee\" . \\n\"\"\"\n",
    "    \"\"\"    ?person foaf:workplaceHomepage ?hp .\\n\"\"\"\n",
    "    \"\"\"}\"\"\"\n",
    ")\n",
    "graph.query(query)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "eb00a625-a6c9-4766-b3f0-eaed024851c9",
   "metadata": {},
   "source": [
    "## Return SQARQL query\n",
    "You can return the SPARQL query step from the Sparql QA Chain using the `return_sparql_query` parameter"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "id": "f13e2865-176a-4417-95e6-db818b214d08",
   "metadata": {},
   "outputs": [],
   "source": [
    "chain = GraphSparqlQAChain.from_llm(\n",
    "    ChatOpenAI(temperature=0), graph=graph, verbose=True, return_sparql_query=True\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 29,
   "id": "4f4d47b6-4202-4e74-8c88-aeaac5344c04",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "\n",
      "\u001b[1m> Entering new GraphSparqlQAChain chain...\u001b[0m\n",
      "Identified intent:\n",
      "\u001b[32;1m\u001b[1;3mSELECT\u001b[0m\n",
      "Generated SPARQL:\n",
      "\u001b[32;1m\u001b[1;3mPREFIX foaf: <http://xmlns.com/foaf/0.1/>\n",
      "SELECT ?workHomepage\n",
      "WHERE {\n",
      "    ?person foaf:name \"Tim Berners-Lee\" .\n",
      "    ?person foaf:workplaceHomepage ?workHomepage .\n",
      "}\u001b[0m\n",
      "Full Context:\n",
      "\u001b[32;1m\u001b[1;3m[]\u001b[0m\n",
      "\n",
      "\u001b[1m> Finished chain.\u001b[0m\n",
      "SQARQL query: PREFIX foaf: <http://xmlns.com/foaf/0.1/>\n",
      "SELECT ?workHomepage\n",
      "WHERE {\n",
      "    ?person foaf:name \"Tim Berners-Lee\" .\n",
      "    ?person foaf:workplaceHomepage ?workHomepage .\n",
      "}\n",
      "Final answer: Tim Berners-Lee's work homepage is http://www.w3.org/People/Berners-Lee/.\n"
     ]
    }
   ],
   "source": [
    "result = chain(\"What is Tim Berners-Lee's work homepage?\")\n",
    "print(f\"SQARQL query: {result['sparql_query']}\")\n",
    "print(f\"Final answer: {result['result']}\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 30,
   "id": "be3d9ff7-dc00-47d0-857d-fd40437a3f22",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "PREFIX foaf: <http://xmlns.com/foaf/0.1/>\n",
      "SELECT ?workHomepage\n",
      "WHERE {\n",
      "    ?person foaf:name \"Tim Berners-Lee\" .\n",
      "    ?person foaf:workplaceHomepage ?workHomepage .\n",
      "}\n"
     ]
    }
   ],
   "source": [
    "print(result[\"sparql_query\"])"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.9"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
